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ABSTRACT 

With the advent of reliable positioning technologies and prevalence 
of location-based services, it is now feasible to accurately study the 
propagation of items such as infectious viruses, sensitive informa- 
tion pieces, and malwares through a population of moving objects, 
e.g., individuals, mobile devices, and vehicles. In such application 
scenarios, an item passes between two objects when the objects are 
sufficiently close (i.e., when they are, so-called, in contact), and 
hence once an item is initiated, it can penetrate the object pop- 
ulation through the evolving network of contacts among objects, 
termed contact network. In this paper, for the first time we define 
and study reachability queries in large (i.e., disk-resident) contact 
datasets which record the movement of a (potentially large) set of 
objects moving in a spatial environment over an extended time pe- 
riod. A reachability query verifies whether two objects are "reach- 
able" through the evolving contact network represented by such 
contact datasets. We propose two contact-dataset indexes that en- 
able efficient evaluation of such queries despite the potentially hu- 
mongous size of the contact datasets. With the first index, termed 
ReachGrid, at the query time only a small necessary portion of 
the contact network which is required for reachability evaluation 
is constructed and traversed. With the second approach, termed 
ReachGraph, we precompute reachability at different scales and 
leverage these precalculations at the query time for efficient query 
processing. We optimize the placement of both indexes on disk to 
enable efficient index traversal during query processing. We study 
the pros and cons of our proposed approaches by performing ex- 
tensive experiments with both real and synthetic data. Based on 
our experimental results, our proposed approaches outperform ex- 
isting reachability query processing techniques in contact networks 
by 76% on average. 

1. INTRODUCTION 

Studying how items such as infectious viruses, ideas and habits, 
malwares, and broadcast messages propagate through a population 
of moving objects, e.g., individuals, mobile devices, or vehicles, 
is of importance in a wide range of applications including public 
health monitoring, social behavior analysis, computer security and 
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intelligent traffic monitoring, to name a few. In such application 
scenarios, objects pass items among themselves once they are in 
sufficiently close distance, i.e., once they are so called in contact. 
Accordingly, once an item is initiated by an object, it can penetrate 
the evolving network of contacts among objects termed the contact 
network. With such analysis, one can for instance, design public 
health interventions in order to control propagation of infectious 
diseases, or find the source(s) that have originally leaked sensitive 
information or initiated spread of malwares. 

Arguably, one of the main building blocks for item propaga- 
tion analysis in evolving contact networks is the ability to com- 
pute reachability queries which evaluate whether two objects are 
"reachable" through the evolving contact network. Previously, lack 
of accurate datasets that capture the contact networks has limited 
the accuracy and applicability of propagation analysis (and par- 
ticularly, reachability analysis) in contact networks, and previous 
studies have inevitably resorted to simplified contact network mod- 
els, or small-scale and inaccurate contact datasets. However, with 
the recent advances in developing accurate positioning devices and 
prevalence of location-based services, it is becoming possible to 
capture the location of objects in large scales and for extended pe- 
riods of time, resulting in very large contact datasets that capture 
the history of objects contacts accurately and with high spatiotem- 
poral resolution. In this paper, we focus on defining and efficient 
evaluation of reachability queries in large-scale (disk-resident) his- 
toric contact datasets, where the main challenge is to reduce the 
computation time for query evaluation. 

Consider the contact network depicted in Figure 1 which shows 
the position of a set of objects at each time instance within the time 
interval T=[0,3]. In this figure, two objects are connected by a 
link if they are in contact; for instance, oi and 02 are in contact 
at time 0. The object 04 is reachable from oi during time interval 
of [0, 1]. The reason is that if an item initiated by oi at time 0, it 
can pass from oi to 02 at time and then from 02 to 04 at time 
1. Note that in the same figure, oi is not reachable from 04 during 
[0, 1]. Consider the following examples on how reachability query 
evaluation plays a fundamental role in analyzing item propagation 
through contact networks in the context of some application sce- 
narios mentioned above. With the first example, assume a set of 
individuals O are known to carry a dangerous contagious virus. By 
performing a batch of reachability queries between each individual 
in O and the rest of the population, the individuals who could have 
been directly or indirectly contaminated within a certain time inter- 
val can be identified by determining the set of individuals reachable 
from O in the same time interval. Note that this application requires 
running potentially numerous reachability queries between pairs of 
individuals which can be very time consuming. On the other hand, 
timely medication administration can save lives with most viral dis- 
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Figure 1: Objects positions and contacts between them during 
the time interval [0,3] 

eases. Next, imagine a set of individuals O, e.g., criminals, on a 
watch list and need to be monitored. Law enforcement agencies 
may need to discover those who have been potentially in contact 
with any of the individuals in O. Again, this requires performing 
batch of reachability queries to find all the individuals reachable 
from/to any individual in O. Such analysis may help in preventing 
new crimes and to analyze the relationship between criminals. 

Graph reachability problems which verify whether a path exists 
between two given vertices of a graph are extensively studied in the 
recent years [18, 19]. Our problem is different from the existing 
work on graph reachability in two ways. First, while previous work 
on graph reachability assumes the graph is memory-resident, we fo- 
cus on very large disk-resident contact networks. Accordingly, we 
study how to index the contact network on disk to enable efficient 
query processing. Second, with our problem objects are associated 
with time and space information as they move in an environment 
over time. We show that we can leverage such information for ef- 
ficient reachability query processing, whereas the existing work on 
graph reachability only focuses on datasets that are modeled by ab- 
stract graphs with no connection to space and time. 

In this paper, we propose two index structures for indexing con- 
tact networks, namely ReachGrid and ReachGraph. Consider a 
reachability query which verifies whether an object (query source) 
can reach another object (query destination) through the contact 
network, if we consider only the contacts occurring during a given 
query time interval (query interval). With ReachGrid, our approach 
is to compute reachability on-the-fly by expanding the contact net- 
work starting from the query source. However, the naive expansion 
of the network is prohibitively costly. Instead, to enable "guided" 
expansion, we leverage the following simple and powerful obser- 
vation about contact networks; only contacts that occur in the same 
spatial and temporal locality are relevant for exploration and there- 
fore, exploration of the contacts can be guided through relevant spa- 
tiotemporal localities and can avoid other localities for enhanced 
performance. In particular, with ReachGrid we propose a spa- 
tiotemporal grid to index all contacts in the contact network dataset 
into distinct spatiotemporal localities. At the query time, this in- 
dex is used to guide on-the-fly expansion of the contact network to 
verify reachability. 

On the other hand, with ReachGraph we use the alternative ap- 
proach of precomputing the reachability between objects. It is im- 
practical to precompute reachability for all combinations of query 
source, destination and interval. Therefore, we propose to precom- 
pute reachability query only for carefully selected combinations of 
query source, destination and interval, and leverage these combina- 
tions to compute reachability for all other combinations on-the-fly. 
In turn, at the query time this allows recursively breaking the given 
reachability query to a set of precomputed reachability queries for 
efficient query processing. 

Finally, with both ReachGrid and ReachGraph, the placement of 
index on disk can significantly affect the efficiency of query pro- 
cessing. A naive approach of placing indexes (graph nodes and 



grid cells) on random disk blocks significantly deteriorate query ef- 
ficiency. Accordingly guided by the two following observation, we 
develop enhanced disk placement approaches for ReachGrid and 
ReachGraph. First, contacts are processed ordered by occurrence 
time during query processing. Second, during index traversal, an 
object o' is traversed after o, if is reachable from o. We present 
our proposed disk placement approaches for ReachGrid and Reach- 
Graph on disk in Sections 4 and 5, respectively. 

While ReachGrid evaluates reachability by sweeping contacts 
along space and time dimensions, ReachGraph computes reachabil- 
ity by traversing a connectivity graph. Accordingly, one can expect 
ReachGrid to be comparable with ReachGraph when query time in- 
terval is small, and vice versa. This expectation is confirmed by our 
empirical study in Section 6. Moreover, our proposed approaches 
outperform the existing reachability query processing algorithms 
by 76% on average. 

The rest of the paper is organized as follows. The related work is 
outlined in Section 2. We formally define reachability query in con- 
tact networks in Section 3. We present ReachGrid and ReachGraph 
indexing techniques in Sections 4 and 5, respectively. Section 6 
presents our experimental results. We discuss extensions of our 
reachability problem in Section 7. Finally, we conclude the paper 
and discuss the possible future work in Section 8. 

2. RELATED WORK 

We review the related work in four categories: graph reachabil- 
ity, trajectory indexing and trajectory join, external graph traversal 
and graph indexing and finally, contact networks analysis. 

2.1 Graph Reachability 

Given two vertices u and v ina. directed graph C, graph reacha- 
bility verifies whether there is a path from to v [19, 18]. Although 
we also reduce our problem to graph reachability by converting the 
contact network into a hypergraph, our problem is different from 
previous work on graph reachability in several ways. First, in con- 
trast with the previous work where the focus is on memory-resident 
graphs, we consider disk-resident graphs. Second, we focus on 
"spatiotemporal" graphs and accordingly leverage the spatial and 
temporal properties of such graphs for enhanced index construc- 
tion and graph traversal. In particular, our graph vertex may repre- 
sent multiple objects and moreover an object can be associated with 
multiple vertices. Finally, our proposed multi-resolution graph in- 
dexing and bidirectional graph traversal approaches are unique and 
novel, allowing for unprecedented improvement in the efficiency of 
state of the art reachability query processing approaches. 

2.2 Trajectory Join and Trajectory Indexing 

The research on moving objects data management has tradition- 
ally focused mainly on range and nearest neighbor queries. Re- 
cently, trajectory join has also been studied [2, 1]. The problem of 
Closest-Point-of- Approach (CPA) is proposed and studied in [1]. 
Given a set of trajectories, CPA finds the pair of objects whose 
closest distance is less than d. Although CPA problem is different 
from trajectory join, the solution to CPA problem can be adopted 
to solve trajectory join. Although we use trajectory join algorithms 
in constructing the contact network, our focus is on indexing a con- 
tact network for efficient reachability query processing. Another 
relevant body of related work on trajectory processing is trajec- 
tory indexing [5] which focuses on indexing trajectories for effi- 
cient processing of range queries and its variations. In contrast, our 
problem is how to index a contact network for efficient reachabil- 
ity query processing which is much more complex as compared to 
range query and its variations. 
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Shortest path on graphs [13, 7] is another body of related work. 
Given a graph G=(V,E), the shortest path finds the optimally short- 
est path assuming a traveling cost between each pair of graph ver- 
tices. In contrast, with reachability query we are only interested in 
verifying whether any contact path exists between two objects. 

2.3 External Graph Traversal and Graph In- 
dexing 

With external memory graph traversal [12, 17], researches have 
extended the classic graph traversal approaches such as Depth- 
First-Search (DFS) and Breadth-First-Search (BFS). As mentioned 
earlier, both DFS and BFS can be leveraged to answer reachabil- 
ity queries. However, with our work we try to avoid unneces- 
sary expansion of the graph nodes by designing an efficient multi- 
resolution index structure and traversal approaches. 

Another category of work focuses on indexing temporal graphs. 
Time expanded network (TEN) and Time aggregated network 
(TAN) [14] are two models to represent time varying networks. 
TEN represents the time dependence by instantiating a snapshot 
of the network at every time instance. TAN extends TEN where 
the time varying attributes are further aggregated over edges and 
vertices. We utilize TEN to initially model a contact network but 
afterward convert it to a more complex index structure as discussed 
in Section 5. Recently, [8] studied efficient indexing of spatiotem- 
poral networks represented by TEN. However, in this paper the fo- 
cus is on indexing techniques to enable efficient processing of route 
evaluation and retrieval queries as opposed to our work which fo- 
cuses on the complex reachability query processing. 

2.4 Contact Networks Analysis 

Recent studies [16, 10] have focused on analyzing characteris- 
tics of the contact networks such as average contact path length be- 
tween two objects, or time duration until two objects contact each 
other again are studied recently. This area of work is orthogonal 
to our work as we are focusing on indexing a contact network for 
efficient reachability query processing. 

Routing in delay-tolerant networks (DTN) which lack continu- 
ous network connectivity is another body of relevant work [9]. The 
difference between this body of work and our work is two fold. 
First, the goal of routing in DTN is to find a best path from a source 
to a destination node based on a cost metric such as messages deliv- 
ery ratio. Next, our reachability query is associated with a time in- 
terval parameter which is leveraged during index construction and 
query processing to enable efficient reachability query processing. 

3. PROBLEM DEFINITION 

In this section, we first define contact network and afterward for- 
malize the reachability query in a contact network. 

3.1 Contact Network 

Consider a set of objects O moving in an environment E. We say 
a contact c={oi, oj} has happened between two objects o^, oj G 
O, when they are within a sufficiently close distance to transmit 
an item, i.e., when their distance is less than a threshold dr- The 
value of dr depends on the application of interest. For example, 
for disease propagation through human populations dr is in the 
order of meters while with Bluetooth data transfer through a set of 
mobile devices dr is in the order of hundred meters. We call Oi and 
Oj the contacting objects during c, and we define the time interval 
Tc within which contact persists the validity interval of c. 

Consider a time interval T during which objects in O are moving 
in an environment E, and making various contacts over time. The 
movement of each object o G O can be modeled by the trajectory 



of o which captures the position of o at each time instant t ^ T. We 
term the collection of contacts between pairs of objects in O during 
the time interval T as contact network of O during T and repre- 
sent it by C. For example with Figure 1, ci={oi , 02}, C2={o2 , 04}, 
C3={o3,04} and C4={ 01,02} are the contacts occurring during 
T=[0, 3] having validity intervals Tc^=[0, 0], Tc2=[l, 1], Tc3=[l, 2] 
and Tc4=[2, 3] . Notice that we differentiate ci and C4 although they 
have the same contacting objects, because by definition a validity 
interval is required to be continuous. 

3.2 Reachability Query 

Consider a contact network C which is constructed based on the 
history of movement of objects O in an environment E during a 
time interval T. Given a pair of objects {oi,Oj)^ Oi,Oj G O, and 
a time interval Tp C T, the reachability query q verifies whether 
there exists a contact path pij from Oi to Oj during time interval Tp. 
Intuitively, a contact path between two objects Oi and Oj consists of 
a sequence of contacts in the contact network C through which any 
virtual item i can travel the network to go from Oi to oj . We define 
a contact path from object d to object oj as a series of contacts 
(ci, C2, . . . , Cn) in C, where Tc- overlaps Tp (1 < i < n), and for 
each pair of contacts Ci and c^+i (1 < i < n — l)we have 1) the 
contacts share an object, i.e., if ci={oi, 02} and C2={o3, 04} then 
02=03, and 2) Tc- starts before Tc-^^ in time. 

We call Oi, Oj and Tp, query source, query destination and query 

T 

interval, respectively, and denote such a query hy q: oi^ Oj . 

4. REACHGRID 

To evaluate q\Oi oj, one approach is to first materialize 
the contact network C' , which captures all contacts that have 
occurred during Tp. It is obvious that other contacts are irrel- 
evant to processing q. One can construct C as follows. Sup- 
pose trajectory of an object Oi e O during T is represented by 
ri={(vi, ti), . . . , {vt, tn)} which is a sequence of position-vector 
and time stamp pairs (vj,tj), where vj is the position vector of Oi 
at time tj G T. Accordingly, a segment ri{w) of a. trajectory n dur- 
ing a time window w is defined as a subset of (vj^tj) pairs from 
r whose timestamps belong to w, i.e., ri{w)={{vj ,tj)\tj G w}. 
Assume that the set of trajectories segments from all moving ob- 
jects o G O during Tp is denoted by R{Tp), i.e., R{Tp)={ri{Tp)}. 
A window trajectory join between two sets of trajectories P and 
Q, denoted by P [X^^ Q, returns tuples (p,q,w) where p ^ P 
and ^ G Q are within the distance of dr during w. C' can be con- 
structed by performing a self spatiotemporal join on R{Tp), i.e., 
R{Tp) Dxid^ R{Tp), and subsequently creating a contact between 
object Oi and Oj at time t if the join result includes (oi , oj , w) where 
t ^ w. Once generated, C' can be traversed to identify any existing 
contact path between o^ and oj . 

Although the aforementioned approach correctly answers reach- 
ability queries, it can be very inefficient due to redundant process- 
ing. In particular, one may not need to consider all the contacts 
in C' to process a query q in two cases. First, all contacts be- 
tween objects which are not reachable from query source Oi dur- 
ing query interval Tp are irrelevant to q. For example for Figure 1 

[2,3] . . , , 

and g:oi ^ 02 , it is unnecessary to process the contact between 
03 and 04 as neither can possibly be reachable from oi during 
[2,3]. Second, we observe that oj may be reachable from Oi during 
Tp C Tp where \Tp\ <C |Tp | . In this case, the contacts whose valid- 
ity time interval do not overlap Tp are irrelevant to q and redundant 

for query processing. For example for Figure 1 and q : oi 04, 
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there is no need to process the contacts occurring during [2, 3] as 
04 is reachable from oi during [0, 1]. 

Inspired by the aforementioned observations, we introduce an 
efficient query processing approach that given a reachabihty query 
q tries to only construct the portion of C' which is necessary for 
processing q. To this end, first during an offline phase we construct 
a spatiotemporal index structure, dubbed ReachGrid. ReachGrid 
enables pruning most of the contacts irrelevant to the query q. Dur- 
ing the online processing phase, we incrementally find the objects 
reachable from the query source in the order of becoming reachable 
from query source when sweeping over query interval. We stop the 
process either if query destination is discovered reachable from the 
query source, or all the contacts occurring during query interval and 
between objects reachable from query source are processed. 

4.1 Index Construction 

ReachGrid leverages the locality of objects over space and time 
to avoid traversing irrelevant contacts to a reachability query. It 
leverages temporal locality to stop query processing as soon as a 
contact path between query source and destination is discovered 
when traversing the contacts ordered by their occurrence time. To 
this end, the object trajectories segments are grouped based on the 
time stamp of the position- vector pairs in the objects trajectories. A 
contact between two objects occurs when they are in close proxim- 
ity. Therefore, grouping the objects based on spatial locality tends 
to aggregate the objects, which are in contact over time, together 
and in a same group. This enables traversing a subset of groups 
which includes only the objects reachable from query source when 
processing the query. ReachGrid enables temporal and spatial lo- 
cality by imposing two grids on the objects trajectories. The first 
grid partitions the time interval T (T is the the time interval during 
which all the contacts in C occurred). The second grid spatially 
partitions the trajectories segments within each time interval in T. 

We construct ReachGrid as follows. First, we partition the time 
interval T into a set of disjoint time intervals, i.e., T=(Ti,. . .,Tn). 
Next, we spatially partition the trajectories segments during each 
Ti, the trajectories segments in R{Ti), based on locality. To this 
end, for each time interval Ti we impose a grid d on the environ- 
ment E which subsequently partitions the trajectory segments in 
R{Ti). In this way, a grid cell c in d includes trajectories segment 
which span the area represented by c. Notice that a trajectory seg- 
ment ri{ti) G R{Ti) may span multiple cells of d. The temporal 
and spatial grids' resolutions depend on the input contact network 
and query workload and we select them empirically in Section 6. 

An example for constructed index is shown in Figure 2 where 
T is partitioned into six time intervals. Furthermore, a 4 x 4 grid 
imposed on the environment to spatially partition the trajectories 
segments during To and Ti. To and Ti have three and two time 
instances, respectively. The grid cells for the first two time inter- 
vals, i.e., the grids in Co and Ci, are shown while the rest are not 
shown for illustration purposes. Three different objects are in O 
and represented by circle, square and triangle over time. 

As the query processing progress by exploring trajectory seg- 
ments in spatial grid cells, we propose to place the trajectories in 
a cell c in d on consecutive blocks on disk to enable efficient re- 
trieval of necessary trajectories segments during query processing. 
Moreover, the position- vector and time stamp pairs (^,t) of tra- 
jectories segments in c are placed on disk ordered by their time 
stamps. This enables avoiding processing all the trajectories seg- 
ments within c as soon as a contact path between query source and 
destination is discovered. Accordingly, placement of the cells in 
different time grids on disk, i.e., cells in d versus the cells in Cj 
where i < should be decided. Based on the same goal of early 



query processing termination, we place the cells in d before the 
cells in Cj on disk. 




Figure 2: ReachGrid Index Figure 3: ReachGrid Query 
Example Processing 

4.2 Query Processing 

Query processing aims to incrementally find the objects reach- 
able from query source by sweeping the query interval. To this 
end, at the beginning of the query processing, the query interval 
is broken into a subset of time intervals by imposing the temporal 
grid constructed in the previous section, i.e., Tp={Tj, . . . ,Tk). Af- 
terward, the grid cell c in Cj which includes the query source at 
the beginning of query interval is located, i.e., cell c includes the 
query source position at the beginning of query interval. This can 
be be executed in constant number of 10 s assuming that an exter- 
nal hash table maps each object to its trajectory over time. Assume 
we call the set of objects reachable from query source during query 
processing, the seed set. Initially, the seed set includes only the 
query source. To process the reachability query, the algorithm iter- 
ates over each Ti in Tp and discovers new seeds. To this end, at the 
beginning of Ti the grid cells which include the current seeds are 
located. Subsequently, objects which are reachable from at least 
one of the seeds during Ti are found and added to the seed set. No- 
tice that as soon as a new object reachable from query source is 
discovered, it is added to the seed set and hence the process con- 
tinues with the updated seed set. The order in which new seeds 
are discovered is based on the time order they become reachable 
from any of the current seeds. In some cases, Tj may be an interval 
whose start point different from the query interval start point. In 
these cases, we start processing Tj from the query interval start- 
ing point. We stop the query processing if the query destination is 
added to seed set or when the entire query interval is processed. 

The main step in the query processing is discovering new seeds 
during each T^, j < i < k. Assume that the set of current seeds 
at the beginning of Ti is Si. The goal is to discover i.e., the 
set of seeds at the beginning of Ti+i which is the same as that 
of end of T^. Presume the set of grid cells in which the seeds in 
Si are located is denoted by Csi ■ We first discover all the other 
cells which may contain an object o in contact with a seed dur- 
ing Ti . We call such cells potential seeds cells and denote them by 
Ni. The cells within Ni can be found efficiently by creating the 
minimum bounding regions (MBR) of the trajectories segments of 
objects in Si and consequently finding and filtering the cells which 
are at the distance of maximum dr from those MBRs. During the 
query processing, whenever Ni is updated, the first object o' in Ni 
is discovered which is not in Si but becomes reachable from any 
of seeds in Si . Intuitively, we propagate a virtual item i from the 
objects in the seed set at the beginning of Ti and find the first object 
which receives i. This can be done by performing spatiotemporal 
join which works by sweeping time during the join interval. Con- 
sequently, we add o' to Si and accordingly find Ni . Assume that 
o' is discovered reachable form a seed during [ti, t^] (Ti=[ti,t2]). 
We continue the process recursively with the updated sets but dur- 
ing [t\t2]. Notice that during Ti, the retrieved cells are buffered to 
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prevent unnecessary future retrievals from disk and are discarded 
at the end of Ti . 

An example is shown in Figure 3 for query processing during a 
Ti. The objects 01,02,03 and 04 locations at time instance to, t\ 
and t2, to < ti < ^2, during Ti are highlighted. The trajecto- 
ries for each object are shown by links connecting positions at the 
aforementioned time stamps. Assume the query source and desti- 
nation are oi and 02 , respectively, and query interval is [to , ^2] • The 
shaded area around the trajectory segment of Oi denotes the MBR 
of the trajectory segment with the width of dr- This MBR shows 
that o is in the seed set Si and any other object whose trajectory is 
within the MBR of the trajectory segment of o will make a contact 
with o and be added to Si. At to, Si contains oi. At ti, oi and 03 
make a contact and hence 03 is added to Si. During [ti,t2] both 

01 and 03 are in Si. Finally, at t2 the cells ci and C2 in which 02 
and 04 are located, respectively, are added to Ni and subsequently, 

02 is added to Si. Therefore, during [to,t2] query destination is 
reachable from query source. Due to illustration purposes, we only 
discussed how Ni changes at t2 in this example. 

The entire online processing step is summarized in Algorithm 1. 
The algorithm gets query source, destination, interval and the in- 
dex constructed during the offline process. First, query interval is 
quantized into time intervals from T. Afterward, the initialization 
is performed in lines 2-5. The algorithm iterates over Ti in Tp and 
for each Ti it performs a join in line 9 to find the first object reach- 
able from a seed during the interval w. Rcg . (w) denotes the set of 
object trajectories segments during w which span the cells in Csj ■ 
We adopt the join approach in [1] which sweeps the time interval 
w and terminates whenever a new object, not in the seed set and 
reachable from query source, is discovered. Consequently, the sets 
are updated in line 10. Finally, the algorithm terminates when oj is 
added to the seeds set or all the intervals in Tp are processed. 

Assume each cell of Cj includes the trajectories of Uc distinct 
objects on average and each disk block contains be cells of Cj 
on average. Finally, assume Tp=[ti,t] C Tp=[ti,t2] is the small- 
est time interval during which query destination is reachable from 
source. If query destination is not reachable from query source 
during Tp, we assume Tp=Tp. The following theorem proves the 
complexity of ReachGrid query processing and index construction. 

Theorem 4. 1 . ReachGrid can be constructed with 0{\0\\T\) 
10 s. The 10 complexity of query processing is Oi ). 

TIq X 0(2, 

We skip the details of the proof due to lack of space. 
Algorithm 1 Query Processing 
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procedure Query PROCESSiNG(oi , Oj ,Tp, I) 
Sj=Oi 

Cs^=FindCells(5j,t) 
Csj=\Jvdatc(Csj,Nj) 
for i=j to k do 

W=Ti = [ti,t2] 

repeat 



> Initializing the seed set 
> Find the cells containing the seed 
> Update Csj based on the cells in Nj 



{o',t') = Rcs. M IXdT 
from a seed 

W=[t',t2] 

Update Ari,Cs- and Si 
until o' = NULL or o' = oj 
if Oj G Si then 

Return 'reachable' 
end if 
end for 

Return 'not reachable' 
end procedure 



Rcg. i'^) ^ ^ Si and is reachable 



> Termination condition 



5. REACHGRAPH 

In this section, we first present ReachGraph index construction 
steps and thereafter discuss ReachGraph query processing. 

5.1 Index Construction 

To construct the ReachGraph for a given contact network C, we 
start from C and apply a series of transformations to C that eventu- 
ally converts it to the ReachGraph hyper graph Hn- The transfor- 
mations are performed in two phases, namely reduction phase and 
augmentation phase. First, we observe that in a contact network C 
one can identify disjoint subset of nodes, where all nodes in a sub- 
set are equivalently reachable or not reachable to/from any other 
node V in C. Accordingly, at the reduction phase we precompute 
these subsets and reduce all nodes in each subset (along with their 
connections) to a single hyper node. We call the resulting hyper 
graph Dn which is a significantly reduced version of C in size. 
Next, at the augmentation phase, to further improve ReachGraph 
we precompute the reachability between pairs of nodes in Dn at 
predefined time intervals. We perform this precomputation at sev- 
eral time resolutions and accordingly augment Dn with a hierarchy 
of extra links to generate the ReachGraph hyper graph Hn. With 
Hn, SL reachability query can be effectively broken into a set of 
precomputed reachability queries for real-time query answering. 

There are two principles in disk placement of Hn vertices which 
can improve the query processing. First, an efficient placement 
should place vertices which are reachable to each other on a same 
disk block. In this way, while retrieving a vertex during the query 
processing, a set of vertices which should retrieved in the future 
are read and buffered as well. Second, there is an order inherited 
in how the vertices of Hn are traversed during query processing 
which should be leveraged when storing Hn on disk. This ordering 
is enforced by the time order at which the contacts in the vertices of 
i^AT are occurred. We explain how to consider these two principles 
in storing Hn on disk to enable efficient query processing. 

In the rest of this section, we first present our model for C as a 
so-called time expanded network. Next, we explain the aforemen- 

, , r . ^ Reduction ^^-^ i r-^ Augmentation 

tioned transformations C — > Dn and Dn — > Hn vn 
detail. Finally, we discuss how to store Hn on disk. 

5.1.1 Contact Network Model 

We represent a contact network C with Time Expanded Network 
(TEN) model [14]. TEN captures the time dependency of a net- 
work by including a separate instance of the network at each time 
instance. Accordingly, each object Oi at time instance t G T is as- 
sociated with a separate vertex Oi (t) . To capture contacts, a bidirec- 
tional edge e={oi{t),Oj{t)) is introduced between Oi{t) and Oj{t) 
if they are in contact at time t. Such an edge captures the fact that 
an item can transfer from Oi to oj at t. Note that we assume transfer 
delay is negligible and hence, e is bidirectional. Moreover, an edge 
is introduced between vertices corresponding to the same object at 
consecutive time instances, i.e., an edge e={oi{t),Oi{t + 1)) is 
created between Oi(t) and o^(t + 1) at each time t. In this case, e 
is a directional edge which shows that Oi can hold an item during 
[t, t + 1] . We define a graph Gt of all vertices and edges at time t, 
i.e., Gt={V, E) where V={oi(t)\oi C O}, as a snapshot of C at t. 

Figure 4 (a) shows an example C which corresponds to the con- 
tact network in Figure 1. With Go in Figure 4 (a), V={o\ (0),O2 (0), 
O3(0),O4(0)} and ^=(01 (0), 02(0)). It is easy to observe that Oj 
is reachable from Oi during Tp=[ti,t2] if and only if there is a 
path from Oi(ti) to Oj(t2)- This path is representing the con- 
tact path from Oi to Oj during Tp. For example, in Figure 4 
(a), 04 is reachable from o\ during Tp=[0, 1] given the path 

(Ol(0),O2(0),O2(l),O4(l)). 
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(a) C (b) Dn 

Figure 4: TEN model of C (a) and the corresponding DAG (b) 



Co q C3 



Figure 5: Dat at the end of re- Figure 6: Dn^ for Hn whose 
duction step I^a^i is the graph in Figure 5 



5.7.2 Transforming the Contact Network 

5.1.2.1 Transforming by Reduction. 

In the reduction phase, we perform two distinct steps to convert 
C into a hypergraph Dn with significantly smaller size. Reducing 
C makes it more efficient to traverse for finding possible contact 
paths during query processing. Notice that these reduction steps 
are lossless and preserve the accuracy of query processing, we first 
state two properties which are utilized for reduction. 

Property 5.1. [Snapshot Symmetry] If Oj is reachable from 
Oi during a time instance t, i.e., query interval Tp = [t,t], Oi is 
reachable from Oj at the same interval. 

Property 5.2. [Transitivity] Suppose Oj is reachable from Oi 
during Tp = [t 1,12] and Ok is reachable from oj during Tp = [t'i , . 
Ift2< t'2 then Ok is reachable from d during Tp=[t 1,12]. 

At the first step of the reduction phase, the idea is to precompute 
and materialize the reachability between objects at each time in- 
stance t. According to properties 5.1 and 5.2, the connected com- 
ponents of C capture the set of objects that are reachable from each 
other at t. For instance, in Figure 4(b), C4={ 02(1), 03(1), 04(1)} 
which captures the fact that all objects 02, 03 and 04 are reachable 
from each other at time instance t=l. Furthermore, if one object 
from a connected component c G is reachable from another 
object in a connected competent c G Gt' during Tp=[t,t'], then 
it is easy to deduct from properties 5.1 and 5.2 that all object in c 
are reachable from all other objects in c during Tp. Accordingly, 
at the first step of the reduction phase, we transform C to a graph 
Dn whose vertices are the connected components of C. To this 
end, first in every Gt G C we replace all the vertices within the 
same connected component c by a single vertex represented by c. 
Suppose the collection of the connected components of Gt are de- 
noted by Gt . Next, we create an edge from every c G Ct to every 
other c' G Ct+i, if in C we find at least one edge from a ver- 
tex in c to a vertex in c . This transforms G into a directed acyclic 
graph (DAG) D n with significantly smaller number of vertices and 
edges as compared to G while preserving reachability between ob- 
jects. With Dn, Oj is reachable from Oi during Tp=[ti,t2] if the 
connected component of Oj{t2) is reachable from the connected 
component of Oi(ti). Therefore to answer a reachability query, we 
need to find the corresponding connected components of d (ti) and 
Oj{t2) given Oi{ti) and Oj{t2) at the query time. As we explain 
later, we generate and use external hash table l-Lt for each time in- 
stance t G T to locate the the connected component corresponding 
to each vertex Oi{t). 

The second step of reduction phase merges identical connected 
components in consecutive Gts over time. If a set of objects 
O' C O are reachable from each other (and only from each other) 
during a time interval T' C in Dn they all belong to snapshots 



of the same connected component during . Therefore, to further 
reduce the size of D at we can keep one copy of such connected 
component during T' and consider it as the connected component 
of objects in O' during the entire T^ For example, in Figure 4(b) 
C5 and Or are snapshots of the same connected component during 
T'=[3, 4] and can be merged. To generalize, assume a set of con- 
nected components ct G Ct, ct+i G Ct+i, . . . , ct+n G Gt+n all 
have the same members 0\ and T'=[t, t + n]. In such a case, we 
remove ct, . . . , ct+n-i and connect parent of ct in Dn (say a con- 
nected component in Gt-i denoted by d) to ct+n by a weighted 
edge e{n). We call e(n) an aggregated edge where the weight cap- 
tures the fact that for the next n time instances, d is only reachable 
to objects in 0^ Figure 5 shows Dn from Figure 4(b) after this 
step of reduction. C5 is removed, C4 and 03 are connected to cr by 
aggregated edges e(2) and e\2). This reduction can significantly 
shrink Dn, especially when the sampling rate for objects positions 
is high relevant to the objects moving speed. 

5.1.2.2 Transforming by Augmentation. 

In order to find a path between two connected components 
Ci G Gt and cj G Gt' , can simply expand Dn starting from 
Ci and check if we can find a path that reaches cj . Although Da^ is 
much smaller than C, such expansion can still take a long time to 
terminate. Hence, we propose to precompute reachability between 
certain vertices of Dat to enable quick traversal of Dn- 

In particular, we propose to precompute reachability during dif- 
ferent predefined time intervals. To this end, we break T into a set 
of disjoint intervals /i, /2, . . . , /n with equal length L, and pre- 
compute reachability between vertices in Gt^ and Gt^ for each 
Ii=[ta,tb]. Accordingly, Dn is augmented with a new directed 
edge from every connected component c G Ct^ to every other 
connected component c G Gt^, if there is a path of length L 
from c to c^ We call such edges the long edges and weight them 
by L which indicates the number of time instances that encom- 
pass. The resulted augmented hyper graph Hn can be consid- 
ered as the union of Dn with a new graph consisting of long 
edges each with a weight L. We term the latter graph contact 
network at the L-th resolution and denote it by Dnl- Accord- 
ingly, D N can be considered as the contact network at first reso- 
lution or Dni- One can extend this idea and precompute reacha- 
bility at other time intervals to generate a multi-resolution graph 
Hn=Dn U Dnl^ U Dnl^ U . . . U Dnl^- However, this can 
significantly increase the number of edges if overdone and hence 
adversely reduce the efficiency of query expansion. In Section 6, 
we experimentally select the optimal resolutions for Hn - Figure 6 
depicts Dns where Dn shown in Figure 5. 

5.1.3 Disk Placement 

We distinguish two cases in traversal of Hn which is a disk- 
resident hyper graph. With the first case, internal memory can hold 
c X \ V{Hn) I values where V{Hn) is the set of vertices in Hn and 
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Figure 7: ReachGraph for the contact network in Figure 5 



c is a small constant (c ^ 12.375) [15]. In this case, it is possible to 
construct the DFS tree of the graph and maintain it in the internal 
memory and traverse it during query processing to verify reacha- 
bility. With the second case, the aforementioned assumption on the 
number of vertices does not satisfy. For this case, we adopt the 
idea of external BPS presented in [12] to enable efficient retrieval 
of vertices during query processing. Similar to [12], we partition 
Hn and place the vertices within the same partition on consecutive 
disk blocks. However, we adopt the technique for directed graphs 
as if is a DAG. To this end, we first sort the vertices of Hn in 
topological order which is the same order in which Hn is traversed 
during reachability query processing. Notice that finding such or- 
der is trivial as if at vertices are created over T in topological order 
(vertices in d are generated before that of C^+i). Afterward, from 
each vertex v we find all the vertices U with the shortest distance 
of of at most dp from i.e., vertices at the depth of at most dp 
from V. The set of vertices inUUv are reachable from v and forms 
a partition pv . We term v the root of pv . We iterate over the ver- 
tices and create a partition rooted at a vertex if is not already 
assigned to a partition. Notice that only the edges in i^Ar are con- 
sidered in creating the partitions and hence long edges are ignored 
during the partitioning process to preserve the temporal locality of 
graph vertices within the same partition. The partitions are placed 
on disk in the same order they are generated. 

The final index for our running example is shown in Figure 7 
where the long edges are denoted by 6^(3), i = 1, ... ,4, and 
the aggregated edges by e(2) and e'{2). The hyper graph iiAr 
and the hash tables which associate objects with the partitions of 
iiA^ are located on the disk. Each hash table l-tt locates the parti- 
tion which contains Oj{t) given object oj and the time instance t. 
In this example, five partitions po, Pi , • • • , P4 are generated where 
their connected component members are {co, €3,04}, {ci}, {C2}, 
{c6, C8, cg} and {cr}, respectively. The members of the connected 
components are placed within the vertices of iiA^ as we discuss in 
next section. Although not shown in the figure, we store the reverse 
graph of Dni on disk as well, i.e., if e={u, v) ^ Dni then we add 
e = {v,u) to Hn - This enables efficient bidirectional traversal of 
Hn SLSWQ discuss in the next section. Finally, a hash table is stored 
in main memory to enable fast lookup of V,t for a given t and con- 
sequently finding the partition of Hn which includes query source 
(destination) at the beginning (end) of query interval on disk. 



5.2 Query Processing 

T 

Consider a reachability query q:oi ^ Oj where Tp=[ti,t2]. To 
process one can first find the vertices vi and V2 in iiA^ which 



representing the connected component of Oi and Oj at ti and t2, re- 
spectively. Afterward, starting from vi, Hn can be traversed either 
by BFS or DFS techniques to visit all the vertices at the depth of at 
most |t2 — ti I from vi . oj is reachable from Oi during Tp if and only 
if V2 is among the visited vertices. Unfortunately, this approach 
may visit a huge number of vertices specially when ti <^ t2. In 
this section, we propose two powerful ideas which significantly re- 
duce the number of visited vertices during iiAr traversal. First, 
we leverage multi-resolution index to traverse iiAr. Consequently, 
whenever possible the long edges with the largest weights are taken 
during traversal (the traversal is performed on the higher resolu- 
tions first) to enable fast traversal of ii^A^. Second, motivated by 
transitivity property 5.2, we traverse Hn from both directions to 
find a possible contact path between query source and destination 
faster. In particular, Hn is traversed forward starting from query 
source and in parallel it is traversed backward on the reverse of Dn 
starting from query destination. The traversal is terminated in two 
cases. Either, an object which is reachable from query source and 
reachable to query destination is found, or Hn is traversed in both 
directions until the bidirectional traversal stops at the middle of the 
query interval. 

Counterpart to traversal algorithm for memory-resident graphs, 
external graphs traversal algorithms are studied in the literature as 
well [12, 17]. We denote external BFS and DFS by E-DFS and 
E-BFS, respectively. Although both E-DFS and E-BFS can be 
adopted to traverse iiAr, we adopt E-BFS to enable bidirectional 
traversal of iiA^. Accordingly, our ReachGraph query processing 
works by performing E-BFS in parallel from vi and V2 where the 
search from V2 traverses the reverse graph of Da^^ . Assume the set 
of objects in vertices visited during forward traversal, i.e., traver- 
sal originating from vi, is denoted by Of- Accordingly, we de- 
note the set of objects in vertices visited during backward traver- 
sal by Ob - The traversal is terminated either when ObIJOf be- 
comes non-empty or when all the vertices reachable from Oi dur- 
ing [ti, reachable to oj during [ (^1+^2) ^ 
versed. In the first case query destination is reachable from source 
while this is not true for the latter case. A partition is retrieved and 
buffered during traversal to enable in-memory lookup of some of 
the future vertices. Older partitions in memory can be discarded 
when there is not enough space for new partitions. During forward 
traversal, if a vertex is connected to long edges, the edges with the 
largest weight are traversed and the other edges are ignored. We 
term this approach Bidirectional Multi-resolution BFS or BM-BFS. 

The pseudocode of BM-BFS technique is presented in Algo- 
rithm 2. The algorithm first finds the vertices vi,V2 G iiAr in lines 
2-3. The function FindVertex(p, o, t) gets a partition p, an object 
o and a time instance t and returns the vertex of Hn which con- 
tains o(t). Afterward, two queues are initialized for the forward 
and backward traversal of the input graph in line 4. Of and Ob 
are also initialized in line 5. We denote the set of object whose 
instances are included in v by Ov- The algorithm runs forward 
(line 7) and backward traversal (lines 8) in parallel by running 
ProcessQueue procedure until both Q f and Q b become empty or 
reachability is verified. With ProcessQueue procedure, the vertex 
Vh in the head of either queue is extracted in line 2. Each object 
in Ovi^ is examined to check whether it is already visited in the 
reverse traversal (lines 5-8). If this is the case, query destination 
is reported reachable from query source. Afterward, each children 
V of Vh is added to the traversal queue to enable the next steps 
of traversal. Child(v,direction) procedure returns the edges at the 
highest resolution originating from v whose end points are the ver- 
tices representing time instance t G [ti, ^^^-^] and t G [^^^^, ^2] 
for forward and backward traversal directions, respectively. The 
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following proves the correctness of BM-BFS. 
Algorithm 2 BM-BFS 
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procedure BM-BFS(oi ,Oj,Tp = [ti,t2],HN) 
'i;i=FindVertex('Hti (oi), Oi, ti) 
'i;2=FindVertex('Ht2 (oj ),Oj,t2) 
Qf -push^vi), Q B •push{v2) 
Of •CLdd(Ovi), Ob •CLdd{Ov2) 
while \Q F -isEmptyQWlQ B -isEmptyQ do 
ProcessQueue(0 b ,Qf ,F) 
ProcessQueue(OF ,Q b 3) 
end while 
return false 
end procedure 

procedure ProcessQueue(0,Q, direction) 

if IQ.isEmpty and Vh = Q.popO is not visited before then 
for o G O^j^ do 

\i O .contain s{o) then 

return true 
end if 
end for 

fore G C/ii/d(i;^, direction) do 
Q.add{c) 
end for 
end if 
end procedure 



Theorem 5.3. BM-BFS verifies the reachability from query 
source to destination during query interval. 

Proof. First, assume that Hn only includes one resolution, 
i.e., Hn=Dni- Hn is a DAG whose vertices are topologi- 
cally sorted and time stamped. The forward traversal visits all 
the vertices representing contacts with validity interval subset of 
[^1,^^^^] and reachable from query source. Accordingly, the 
backward traversal visits all the vertices representing contacts with 
validity interval subset of [^^^^ , t2] • Therefore, if a path p from vi 
to V2 exists, then the vertices in p are discovered after forward and 
backward traversal of Hn- In addition, the vertices in p are time 
stamped and therefore, the order of vertices in p are preserved dur- 
ing traversal of Hn- When we consider long edges during traver- 
sal, some vertices of Hn which representing specific time instances 
may not be visited. However, general connectivity of the graph is 
preserved at all the resolutions and therefore by taking long edges 
the query can be still verified correctly. Also, based on the transitive 
property 5.2 the early termination condition accurately terminates 
the traversal. This completes the proof. □ 

Assume that each partition includes instances of Up distinct objects 
and each disk block holds bp partitions on average. The following 
theorem proves the complexity of ReachGraph query processing 
and index construction (|Tp| is defined in Theorem 4.1). 

Theorem 5.4. The ReachGraph index can be constructed 
with 0(\0\\T\) 10 s. The query processing 10 complexity is 

^ Up Xbp 

We skip the details of the proof due to lack of space. 

GRAIL [18] is one of the most efficient graph reachability ap- 
proaches for memory resident graphs. It works based on the idea 
of randomized interval labeling of graph vertices. Table 1 com- 
pares the index construction and query time complexity of Reach- 
Grid and ReachGraph with that of GRAIL when adopted on disk- 
resident Dn to process reachability queries. Our approaches sig- 
nificantly outperforms GRAIL because of efficient disk placement 
and also early termination of queries (\Tp\ < \Tp\). With GRAIL, 
(i is a small constant and it is the number of intervals assigned to 
each graph vertex, rir is the average number of objects which are 
reachable from any object o G O at each time instance t G T. 
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6. EXPERIMENTS 

We perform our experiments on both synthetic and real datasets 
modeling the contacts between moving objects which are either ve- 
hicles or individuals. Our synthetic data sets are generated by two 
different data generators. The first data generator, GMSF [3], mod- 
els the movement of individuals in an environment of lOOkm^ as- 
suming their movement patterns follow random waypoint model 
with the average speed of 2m/ s. The trajectories samples are cap- 
tured every 6 seconds. Random waypoint is one of the most used 
models in literature to model individuals' movement. With this 
model, every individual selects a random destination and speed and 
then moves toward that destination. Afterward, she selects another 
random destination and moves toward it [1 1]. The second data gen- 
erator is the Brinkoff generator which is commonly used for gener- 
ating realistic moving objects [4]. We generated the trajectories of 
a constant set of vehicles moving on the road network in San Fran- 
cisco city covering an area of approximately The vehicles 
locations are recorded on average every 5 seconds. The reason of 
using two different synthetic data generators is to study the differ- 
ence between the case of reachability query processing for differ- 
ent categories of moving objects, i.e., individuals and vehicles. In 
particular, vehicles are restricted to move on a road network while 
individuals can move to any environment point. With the first syn- 
thetic data generator we generate 1000, 2000 and 4000 vehicles 
trajectories. We denote these datasets by VNik, VN2k and VW4/C, 
respectively, and term the collection, VN datasets. With the second 
synthetic data generator, we generate 10, 000, 20, 000 and 40, 000 
individuals' trajectories. We term these datasets RWPiok, RWP20k 
and RWP^ok, respectively, and call the set of these datasets, RWP 
datasets. The reason of generating more objects trajectories with 
the second dataset is that the objects are distributed in the entire 
space with the second generator as opposed to the first generator in 
which objects only move on the road network. With both genera- 
tors, we generate trajectories for the duration of four months (more 
than 119 days). Accordingly, RWP and VN datasets include more 
than 1,700,000 and 2,048,000 time instances, respectively. The size 
of the data for each dataset is represented in Table 2. 

Our real dataset captures the movements of vehicles in the city 
of Beijing. This dataset covers the GPS tracks of more than 2500 
distinct vehicles collected during a day. The vehicles GPS tracks 
cover an area of approximately 600/cm^. The vehicles locations 
are recorded every minute and further interpolated to reflect the lo- 
cations for every five seconds. Unfortunately, because of the small 
scale of this datasets we only use it in a subset of experiments. 

Our experimental system specification is presented in Table 3. 
For each experiment setting, we run the algorithm 400 times to 
compute the average values. The query sources, destinations are 
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selected randomly and query interval is selected as a random in- 
terval where the length of the interval is a random number between 
150 and 350 unless otherwise stated. We presume vehicles are con- 
tacting each other by communicating over DSRC protocol which 
has the effective range of 300 meters. Accordingly, we assume in- 
dividuals are making contacts by communicating over Bluetooth 
protocol which has the typical range of 25 meters. Therefore, we 
set dT=25 for RWP and dT=SOO for VN datasets. 

Finally, to measure the performance of reachability query pro- 
cessing we measure the number of random lOs. Hence, the sequen- 
tial lOs are normalized to random accesses by assuming that each 
random access costs as much as 20 sequential accesses [6]. No- 
tice that these numbers are system dependent, however, the general 
trends in the results should be obtained for machines with the differ- 
ent settings as well. The rest of this section is organized as follows. 
We first evaluate the efficiency of ReachGrid and ReachGraph ap- 
proaches, respectively. Thereafter, we present the empirical com- 
parison between ReachGrid and ReachGraph. Finally, we compare 
our approaches with existing graph reachability algorithms. 

6.1 ReachGrid 

In this section, we first focus on the efficiency of the index con- 
struction and then query processing step of our ReachGrid. 

6.1.1 Index Construction 

The performance of the ReachGrid depends on the resolution of 
temporal and spatial grids which quantize time interval T and envi- 
ronment E, accordingly. There is a tradeoff in selecting both tem- 
poral and spatial resolutions. By increasing any of the resolutions, 
the number of random accesses to disk blocks increases when pro- 
cessing a reachability query and hence the number of lOs increases. 
The reason is that the locality in time and space is not fully lever- 
aged. On the other hand, decreasing the resolution of grids results 
in placement of huge number of trajectory segments within a grid 
cell. As the result, many trajectory segments which are irrelevant 
for query processing are processed for each reachability query. This 
increases the number of lOs during query processing. 

Here, we empirically optimize the grids resolutions by varying 
both temporal and spatial grids and selecting a combination which 
minimizes the number of lOs when processing reachability queries. 
There are huge possible number of values for the the combination 
of temporal and spatial resolutions, and therefore, we assume the 
same resolution for all the spatial grids d to reduce the number 
of possible combinations. We vary temporal resolution from 5 to 
80 for both datasets and spatial resolution from 128m to 10km 
(17km) for RWP (VN) datasets and select a combination which 
minimizes the number of lOs while processing reachability queries. 
We denote the optimal spatial and temporal resolutions by Rs and 
Rt, respectively. With RWP datasets, Rs=1024m and Rt=20 
and accordingly, with VN datasets, Rs=17km and Rt='20. With 
VN datasets, the optimal ReachGrid indexes have lower resolutions 
than that of RWP datasets. The reason is that VN datasets capture 
the movement of fewer objects as compared to RWP datasets and 
hence spatial grids are larger to place more objects within the same 
cell. Figures 8 (a) and (b) show how 10 count varies when temporal 
and spatial resolutions vary for RWP datasets, respectively. With 
Figure 8 (a) temporal resolution is 20 and with Figure 8 (b) the 
spatial resolution equals 1024m. Because of lack of space and the 
fact that VN datasets results also follow the same pattern, we do 
not show the results for VN datasets. 

We also measured the time required to construct the optimal 
ReachGrid indices. The results are shown in Figures 9 (a) and (b) 
for RWP and VN datasets. The x-axis shows the length of time pe- 
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Figure 9: ReachGrid construction time 



riod T over which ReachGrid index is constructed. All these inter- 
vals share the same starting point but different ending point. Over 
all the cases, the index construction time is less than 4.3 hours. 
As expected, increasing the number of objects and duration of T 
makes index construction slower. 

6.7.2 Query Processing 

To evaluate the efficiency of online ReachGrid query processing, 
we compare ReachGrid and naive approach, termed SPJ, which 
generates the contact network C' relevant to query interval on the 
fly and afterward traverse it to verify reachability between query 
source and destination. SPJ generates C by retrieving all the tra- 
jectories segments which overlap with the query interval. Based 
on our experiments, our ReachGrid approach outperforms SPJ by 
at least 96% for all RWP and VN datasets. The reason is that our 
ReachGrid online query processing algorithm avoids constructing 
the portion of contact network which is irrelevant for query pro- 
cessing by intelligent traversal of the contact network. 

6.2 ReachGraph 

Here, we first study the efficiency of index construction and af- 
terward the online query processing approaches of ReachGraph. 

6.2.1 Index Construction 

In this section, we first focus on evaluating the efficiency of in- 
dex construction for the basic contact network (Dn) and afterward, 
evaluate the efficiency of the augmentation step. We conclude this 
section by studying the placement of ReachGraph on disk. 

6.2.1.1 Contact Network Size. 

Here, we empirically measure the contact network size by count- 
ing the number of vertices (\V\) and edges (|^|) of contact network 
(Dn) when generating contact network for different time intervals 
T. The results for RWP datasets are shown in Figures 10 (a) and (b) 
for edges and vertices, respectively. The results for VN datasets fol- 
low the similar pattern and omitted due to space constraints. The 
X-axis represents the length of time interval T during which the 
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Figure 10: Contact network edges (a) and vertices (b) 
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Figure 11: Contact network (Dn) construction time 

contact network is constructed assuming that all time intervals are 
starting from the same time instance, i.e., T=[0, |T|]. As expected, 
1^1 and \V\ increases when \T\ increases. The reason is that the 
number of contacts and accordingly the number of edges increase 
when |T| increases. Accordingly, the number of edges and ver- 
tices increases when the number of objects increases as well. The 
most important observation from this experiment is that the contact 
network can become prohibitively large to reside in the main mem- 
ory. In particular, the number of edges and vertices are more than 
17, 466 and 10, 545 million for RWP4ok, respectively. 

Next, we measure the efficiency of the reduction step proposed 
in Section 5. To this end, we compare the number of vertices and 
edges of Cn and that of Dn for the same settings of the exper- 
iments in this section. With RWP datasets, over all the cases on 
average the number of vertices (edges) of Dn are 81% (80%) less 
than that of Cn, respectively. Similarly, with VN datasets, over all 
the cases the number of vertices (edges) of Dn are 64% (61%) less 
than that of Cn- The results show that reduction step can signifi- 
cantly reduce the size of contact network represented in TEN. 

6.2.1.2 Contact Network Construction Time. 

In this section, we measure the construction time of Dn for dif- 
ferent time intervals T. The results are shown in Figures 1 1 (a) and 
(b) and for RWP and VN datasets. For all datasets, increasing the 
number of objects and |T| increases the construction time. The rea- 
son is that more contacts needs to be processed in order to create 
the contact network. With our experimental setting, the construc- 
tion time for all datasets is less than 14 days. Although this running 
time is large, it reflects the time it takes to construct the entire con- 
tact network over T. However, it is also possible to construct the 
contact network incrementally over time by acquiring the objects 
positions at new time instances and appending corresponding new 
vertices and edges to the previously constructed contact network. 

6.2.1.3 Multi-resolution Graph. 

In this section, we study the performance of constructing the 
contact network at various resolutions. To this end, we mea- 
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Table 4: Average vertex degree for Da^. 



sure the average degree of vertices of if at at different resolutions 
(Dn2i Dn4, ■ ■ - , Dn32)- The average degree for Da^. only consid- 
ers vertices which have at least one edge at A^^th resolution. Table 4 
shows the results for RWPaok and VN4k which have the largest 
number of objects among RWP and VN datasets and also VNr 
which corresponds to our real dataset. As the contact network reso- 
lution increases, the average degree of vertices in the corresponding 
resolution increases. The reason is that over larger time intervals, 
objects are reachable from more objects and hence more long edges 
are introduced at higher resolutions. VNr has significantly smaller 
average vertex degree than the other datasets. The reason is that the 
size of contact network D n for this dataset is much smaller than 
that of other datasets. We decide the optimal number of Reach- 
Graph resolutions in Section 6.2.1.4. 

6.2.1.4 Disk Placement. 

Here, we empirically optimize the placement of multi-resolution 
contact network graph on disk. ReachGraph has two parameters, 
i.e., the number of resolutions and the depth of partitioning, which 
needs to be optimized in order to construct and place the index on 
disk. Here, we empirically find the optimal values for both param- 
eters. To this end, we vary partitions depths from 1 to 64 and the 
number of resolutions from 1 to 7 and count the number of lOs 
for both datasets. Based on our experiments, the optimal partitions 
depth and the number of resolutions are 32 and 6, respectively, i.e., 
dp=32 and Hn=Dn^ U Dn2 U . . . U Dns2 • 

Figure 12 shows how changing the depth of partitions varies the 
number of lOs for RWP20k and VN2k datasets when processing 
reachability queries (Hn includes contact network at the first six 
resolutions). Increasing the depth of partitions gives the opportu- 
nity to buffer more vertices which will be visited in the future and 
hence reduces the total number of lOs. On the other hand, if the 
partitions become too large then many vertices redundant for query 
processing are retrieved from disk which will deteriorate the perfor- 
mance of query processing. Therefore, there is a trade-off between 
partitions depth and lOs count. Similar trade-off is present between 
the number of ReachGraph resolutions and lOs count. 

6.2.2 Query Processing 

Here, we evaluate the efficiency of online ReachGraph query 
processing. The goal of this experiment is to study how bidirec- 
tional traversal and multi-resolution index construction techniques 
improve the performance of ReachGraph. To this end, we com- 
pare the efficiency of bidirectional multi-resolution traversal (BM- 
BFS) approach with bidirectional traversal (B-BFS) and external 
DFS (E-DFS) approaches. B-BFS traverses Hn similar to BM- 
BFS but only at the single resolution of Dn- E-DFS is the naive 
approach which only checks whether there is a path on Hn from 
query source to destination during query interval. We select E- 
DFS as the baseline approach as it is faster than E-BFS. Notice that 
E-DFS does not investigate the members of the connected com- 
ponents as opposed to BM-BFS and B-BFS and therefore it only 
finds the contact paths with the length of query time interval. The 
results for RWP20k and VN2k are shown in Figure 13. BM-BFS 
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is outperforming E-DFS and B-BFS for more than 80% and 15%, 
respectively, for both datasets. The reason is that it leverages long 
edges to make traversal faster and at the same time investigates the 
objects within connected components to stop the traversal as soon 
as a contact path is found between query source and destination. 
B-BFS also outperforms E-DFS significantly because of terminat- 
ing graph traversal as soon as a contact path is discovered between 
query source and destination. 
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6.3 ReachGrid vs. ReachGraph 

In this section, we compare the efficiency of ReachGrid and 
ReachGraph. We generate random queries with varying queries in- 
tervals of 100, 300 and 500 time instances and compare the number 
of lOs for ReachGrid and ReachGraph (BM-BFS) approaches. The 
results are shown in Figure 14 (a) and (b) for RWP20K and VN2K 
datasets, respectively. Based on our results, ReachGrid approach is 
comparable with ReachGraph for the cases in which the query in- 
terval is small. The reason is that with such cases, a small portion 
of contact network should be traversed which is placed on consecu- 
tive blocks on disk and can efficiently retrieved from disk by Reach- 
Grid. Another important observation is that in addition to the query 
interval size, the distribution of objects also affects the performance 
of ReachGrid. With VN2k dataset, the objects are located on road 
network and within the small portion of entire environment E as 
opposed to RWP20k dataset for which the objects are almost uni- 
formly distributed in ^. As the result, with VN2k dataset Reach- 
Grapth approach significantly outperforms ReachGrid (on average 
63%). The reason is that ReachGrid spatial grid cannot leverage 
spatial locality for non-uniform objects distributions. 

We also compare the CPU time of both approaches which is 
the time it takes by the algorithms while ignoring retrievals from 
disk. The result is shown in Figure 15 for RWP20k and VN2K 
datasets. As expected, ReachGraph has significantly lower CPU 
time because of extensive offline precalculations and hence avoid- 
ing spatiotemporal joins at the query time. 
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Figure 14: ReachGrid vs. ReachGraph 



Table 5: GRAIL vs. ReachGraph (denoted by RG) 

6.4 Comparison with Graph Reachability 

Here, we compare ReachGraph query processing with the exist- 
ing graph reachability techniques. In particular, we compare our 
approach with GRAIL [18]. First, we consider contact datasets 
which reside in memory. We compare the performance of Reach- 
Graph and GRAIL on RWP20K and VN2K contact datasets with 
|T|=1000, which are memory resident datasets. GRAIL takes Dn 
as input and verifies whether the query source is reachable to the 
query destination. Table 5 (a) shows the results of this compari- 
son in terms of runtime for random queries with the interval length 
of 300. GRAIL converges to simple DFS for reachability queries 
when source and destination are reachable. Therefore, our ap- 
proach outperforms GRAIL for VN2K while this is not the case 
for RWP20K because of the existence of more pairs of reachable 
objects in VN2K than RWP20K. With RWP20K, GRAIL is 30% 
faster than ReachGraph. In sum, we conclude that our approach is 
comparable with GRAIL for memory resident contact datasets. 

Next, we adopt GRAIL for disk-resident contact datasets and 
subsequently compare the performance of GRAIL and ReachGraph 
in terms of number of lOs for disk-resident contact networks. To 
this end, we issue the same queries but on the disk resident contact 
datasets. We assume that with GRAIL the vertices are placed on 
disk in the same order they are generated during contact network 
construction. The results are shown in Table 5 (b). As expected, 
our approach significantly outperforms GRAIL for disk-resident 
datasets. In particular, it outperforms GRAIL for 76% and 88% 
for VN2K and RWP20K datasets, respectively. 

7. DISCUSSION 

In this section, we briefly discuss how our algorithms can be po- 
tentially extended to address more generic contact-network reacha- 
bility problems where the definition of "contact" is partly different. 
In particular, we discuss two cases. First, we consider uncertain 
contact networks and then, we focus on non-immediate contacts. 

Two objects Oi ,Oj GO make an uncertain contact with probabil- 
ity p, when their distance is less than dr and transmit an item with 
the probability of p. For example, with most viral diseases an indi- 
vidual can infect another one with a disease with some probability 
of p once in proximity, where p depends on various factors such as 
the distance between the individuals. Accordingly, a contact path 
P is also probabilistic with a probability which is the multiplica- 
tion of the probability of contacts in P. We say, oj is reachable 
from Oi during Tp if a contact path exists from Oi to oj with the 
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probability of at least pr ■ We term the ReachGraph for uncertain 
contact networks U-ReachGraph and briefly explain how one can 
extend ReachGraph to U-ReachGraph. We skip the details on the 
ReachGrid extension due to lack of space. 

Both index construction and query processing with U- 
ReachGraph are different from those of the ReachGraph. For index 
construction, each edge e of the TEN model is associated with a 
weight representing the probability of contact between the objects 
represented by e endpoints. For reduction, although the first step 
cannot be applied to the TEN model for uncertain networks (unless 
all the edges within a connected component are associated with the 
probability of one), the second reduction step readily applies to un- 
certain contact networks assuming that the contact probabilities are 
taken into consideration. For augmentation, a long edge e from vi 
to Vj where Vi^Vj G Dn must also be associated with a probabil- 
ity p, where p is the probability of the contact path with the high- 
est probability from vi to Vj during a time interval spanned by e. 
With ReachGraph, we adopted graph traversal approaches such as 
BFS to process a reachability query, whereas with U-ReachGraph 
the contact path probability is also important. Accordingly, with 
U-ReachGraph we adopt graph shortest path algorithms to verify 
whether a contact path with the probability of at least pr exists 
from query source to destination during query interval. 

A non-immediate contact between oi and Oj occurs when the 
distance between the location of Oi at time t and that of Oj at time 
t'(t <t')\s less than a threshold o^t and - t| < Tt. Tt is the 
lifetime of the item initiated by objects and [t^t'] is the contact va- 
lidity interval. For example, a person u carrying a virus may spread 
it in a bus at time t and get off the bus. Later on, another person v 
may get on the same bus at time t' and become infected. The defini- 
tions in Section 3 readily apply to non-immediate contact networks. 
Moreover, ReachGrid and ReachGraph can also be readily adopted 
for non-immediate contact network with one exception. With regu- 
lar contact networks, we perform spatiotemporal join between ob- 
ject trajectories to extract contacts between objects, whereas with 
non-immediate contact network the replicated trajectories should 
be joined to produce contacts between objects. 

8. CONCLUSION AND FUTURE WORK 

In this paper, for the first time we introduced and studied the 
problem of reachability query in disk-resident spatiotemporal con- 
tact networks. We proposed two different indexing approaches, 
ReachGrid and ReachGraph, to enable efficient reachability query 
processing. We have conducted an empirical study with both real 
and synthetic datasets to evaluate our proposed techniques. The ex- 
perimental results show that our proposed techniques outperform 
the existing reachability query processing approaches in contact 
networks by 76% on average. 

In the future, we plan to continue our study on reachability in 
uncertain and non-immediate contact networks. Moreover, we in- 
tend to extend the techniques proposed in this paper to consider 
item transmission delay in contacts. Finally, we plan to extend our 
proposed approaches to be applicable in cloud-computing environ- 
ments to further enhance the efficiency of query processing. 
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